A nearest neighbor approach for stratigraphic 3D modelling: an example from the Llobregat River Delta (Barcelona, NE Spain)

A brief introduction to the use we make of some Python libraries to create 3D images

Although the main objective of this work is to use the knn algorithm to get a 3D geological model of the LRD, the process of obtaining the model implies some techniques and knowledge of Python programing language. We are going to start this notebook with a simple introduction to draw 3D figures with the Python library plotly.graph_objects.

We begin by defining the routes and directories where the data and figures will be located. The figures can be saved in different format: jpg, html, etc. but they can also be shown in this notebook.

We have used Google Earth to draw the delta contour and we have created the file 'deltacontour.csv' with the corresponding data. The data consist in georeferenced points with the UTM_X and UTM_Y coordinates and a fixed height for all points (Cota) that we choose to be 7 meters. We could have taken the real height at each point but this disturbs the image and we prefer to take a flat figure for the delta contour. To handle this kind of data we use the Python library Pandas so we have to import it.

And now we can read the 'deltacontour.csv' file.

We now import the plotting libraries.

We have to adapt and extract from the data we have at each moment what is really needed to draw the figure we want. This can be done directly with Python commands but, since we are going to apply the process many times and to many different data files, we have defined the auxiliary function coordinates(data,positions) to help with this task. For example, to create a 3D figure with the the delta contour file we take data=DATADIR+'deltacontourn.cs and positions=[0,1,2, that are the position were the UTM_X, UTM_Y and Cota are. So we have to input coordinates(DATADIR+'deltacontourn.csv',[0,1,2].

All the auxiliary functions that we need are define in the file functions.py that can be found in the repository. We import all auxiliary functions.

The variable xyzcontour provides the main data for the figure, but we also need to indicate the figure margins, we do this again with an auxiliary function that we call bound.

We are ready to draw the 3D delta contour figure. We can display the figure in this notebook with the figure.show() command or we can save the figure as an HTML file.

Now let's read the data from the boreholes and drop the data that we are not going to use.

We are going to include the locations of the boreholes to a new figure, for that we have to prepare the data and we are going to use the function coordinates for this task. But the first input to this function coordinates is a csv file, so we save the boreholes_dat to a csv file and then apply the coordinates function.

Since the data will get more and more complicated, the number of lines to create the figures will increase more and more, so we have defined auxiliary functions to manipulate the data to make the figure input look as easy as possible.

For example the auxiliary function data_p(list,names,colors,symbols,size) manipulates the data in list (which must be a list of lists of xyx coordinates) to produce the necessary input in a figure to draw a determinate symbol (given in the list simbol) in a determinate position (given in lis) with a specific name in the figure leyend and a specific size (given by the variable size) For example, to produce the same figure 3D Llobregat Delta Contour and boreholes.htm we can also run the following code.

We want to draw only data inside the delta contour in our figures, to do that first we use the Python Geometry library and its Polygon function to create a polygon with the delta contour and then we use the auxiliary function nearby(xyz_list,polygon,distance) that eliminate from xyz_list all those that are not within a distance given by the variable distance of the polygon polygon.

With the data close to the contour we remake the figure 3D Llobregat Delta Contour and boreholes.htm taking only the boreholes near the perimeter contour.

Horizontal sections with the lithosomes predicted by the knn algorithm

The boreholes data is a 3D cloud of points in the LRD, each one with a concrete granulometry. We want to process this data. We start by defining four classes of material depending on the granulometry of the sediments we found at each location, these classes are: clay-silt (< 1 mm), coarse sand (1–5 mm), gravel (> 5mm) and finally the Pliocene of older basement. We also want to classify these materials by horizontal sections or layers separated 5 m one from other (0 to 120 m.b.s.l). We chose the 5 m distance betwenn layers because it is enough representative taking into account the ratio vertical-horizontal (m-km) size of the Quaternary LDR.

We use Pandas commands to do the job but we again define an auxillary function layer_function(data,h) to makes things easier to handle. In this function the variable h indicates the heigth where the layer is located.

The function layer_function(data,h) returns a list of lenght 4. If l_f denotes the output of layer_function(boreholes_data,h),

For example, if we apply the function layer_function to boreholes_data at height -75 m we get.

We have decided to take data every 5 meters of depth. To do this, we define a list with the heights (the Z coordinate) that we delimit between 0 and -120 meters. This list will be used to select from boreholes_data those at each height.

Now that we can split the boreholes_dat by layers and granulometry, we want to predict what material can be found at any position inside the delta contour. But this task is too ambitious and we have to reduce our perspectives, what we do is to define a frame in the x and y axis limits in which we will establish a grid (whose number of rows and columns must be chosen in each case) with the points that we want to predict.

Now we are ready to employ the Python Knn function given by the scikit-learn community.

We define again an auxiliary function to make things easy. This function is knn(boreholes_data,contour_poly,height,axis,grid). Then we apply this function on each lithosome and height on Heights to create the predicted points for each lithosome.

The knn function creates a grid of size grid $\times$ grid limited by the parameters in axis (min X, max X, min Y, max Y). For each point in the grid, it predices the granulometry by looking at the nearest data point (in boreholes_data) in the same layer (determined by height). After that, the prediction of the points outside the polygon contour_poly is set to nan so they are not drawn in the subsequent figures. Finally, a 3-uple is retourned, it contains the following: the first component is a list of lists with the X coordinates of the points in the grid (the same list of X coordinates appear repeated grid times); the second component contains the same for the Y coordinates; the third component is a matrix, where the element at $(i,j)$ contains the prediction for the point $(i,j)$ in the grid, or nan if the point is outside the polygon.

We use the following keys: 0 for gravels, 1 for sands, 2 for clays and 3 for the basement.

Below, an example of this knn function is given for a grid of width 150.

The data stored in the above code lines is in the appropriate shape to be used to draw a figure with any lithosome at any layer. For example, with the next lines of code we are going to draw a figure with all lithosomes found at height -75 m (-75 is the height that appear in the list Heigh with index 15).

We can represent all layers in the same figure.

We want to add to the figure the contour shape at each height, so we need to create the contour data for this.

To maintain the aspect ratio of the HTML image we are also going to add invisible marks at each height in one corner of the image. We don't want to add a name for these marks to the legend so we use the function data_p_nl with is similar to data_p but data_p_n skips the legend.

We are ready to implement the HTML 3d figure 3D_horizontal_sections_LRD.htm, which will be located in the figures directory.

If this figure weights too much to be handled by your browser you can reduce the size of the Heights list by taking, for example, values every 10 meters instead of every 5 meters. The reader may tune this parameter according to the computational resources available, the definition desired for the figure and the number of points in the dataset. If you have a sparser data set, you may want to take a layer every meter (or whatever interval between measurements you have) to use the most number possible of data points to form the lithosomes.

Although we can select individually each of the horizontal planes of the 3D figure 3D_horizontal_sections_LRD.html, simply by tapping the data we want to show in the legend, we are also going to extract 2D figures with each of the layers. We save these figures in the corresponding directory instead of showing them.

In these 2D figures in png format, we represent with circles in different colors the real data taken from borehole_dat, the data predicted by the knn algorithm, the delta contour and the boundary contours where the lithosomes change.

Brown color corresponds to the basement, blue is for clays, yellow for sands and grey for gravels. The dots indicate the poits in borehole_dat.

For example, layer corresponding to height -75 is:

3D stratigraphy of the onshore Quaternary LRD and Pliocene basement top surface.

The objective of this section is to create the 3D figure 3D_lithosomes_ Llobregat_Delta.html that gives a complete view of the basement and lithosomes of the LRD.

The 3D view of the horizontal sections in the LRD allowed us performing a 3D figure for the lithosomes (gravels and coarse sands) and top Pliocene basement surface.

The lithosomes.

By looking at the figure 3D_horizontal_sections_LRD.html with the knn predicions at each height we observe several clusters of material. We are going to use this info to form the lithosomes. We will focus our representation on sands and gravels.

In order to define those clusters of points, we start by selecting from the knn predicted data a point in each one of them. The poits selected for gravels and sand are labeled with the letters p andq respectively.

As before, we use the function data_p to prepare the marks data necessary to add the p's and q's points to the figure.

The lithosomes for gravels and sands will be built from points in p and q respectively. For this, we will apply a nucleation strategy. We start by selecting a point, say p1, and setting a distance, say 400m. Next, we select all grid points classified as gravel by the knn algorithm that are within a distance of 400 m from p1. We put those points in a list say L1 and apply this nucleation strategy to all points in L1 i.e. we put all the points on the grid classified as gravel that are within 400m distance of any point in L1 in a new list L2 and so on recursively until the size of the new list does not increase.

The function grouping applies the above recursive cluster procedure to group the points around a given start point. It is quite inefficient, but its definition is very simple and it gets the job done.

Once the granulometry data of a lithosome is clustered using the nucleation strategy performed by the grouping function. We use the Convex Hull algorithm developed by the SciPy community (https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.ConvexHull.html) to wrap the computed groups. The 3D Convex Hull of a georeferenced dataset is the smallest polyhedron that wraps up all of them. We calculate the Convex Hull of each group we obtain.

To calculate the corresponding Convex Hull we define the function lithosome. This lithosome function uses the grouping function as auxiliary and its output is a list of length four whose elements are the points, vertices, simplices and name respetively of the computed convex hull.

Sometimes we need to remove some points from a group, for example, if the difference in height of the points in the group is too big. This act of elimination will make the output data more suitable. We also have to make proper radio selections to get the right clusters.

We have also reduced the 'grid' parameter from 150 to 50, to reduce computation time. The reader may want to tune this parameter according to the definition desired in the figure and the computational resources available. A grid of 50x50 points produces a decent figure.

Now that we have:

We are ready to prepare the necessary data to implement the 3D HTML figure 3D_lithosomes_Llobregat_Delta.html

The function data_lithosome we defined makes use of the function Mesh3d by plotly.graph_objects to shape the data in the appropriated format to draw the figure. We aplicate data_lithosome to each lithosome.

The figure 3D_lithosomes.html gives a view of the lithosomes.

The basement surface.

We also want to add the Pliocene basement top surface to the 3D figure. We are going to use the knn predictions for de basament but again we reduce the grid size to 50 to reduce the computation time.

Now we zip the data in basement_knn50 to get a list with the coordinates (UTM_X,UTM_Y,Z) and save the output list to a CSV file for processing. You can run the following two code lines just once.

We read the knn predicted data from the CSV file.

We apply the function coordinates to list the UTM_X, UTM_Y and Z coordinates of the predicted data.

There is a lot of data in coordinates_basementknn50. We reduce this data by simply taking the integer part of each UTM_X and UTM_Y coordinate, which means reducing the precision to meters.

Then we calculate the points where the Z coordinate is maximum, that is, the points where the basement is reached first and save the corresponding output data to a CSV file, again the following code lines only need to be run once.

We read the CSV file.

Now we apply the fuction coordinatesand compute new bounds for interpolation.

Now we interpolate the knn predictions.

And we use the function cutting to restrict the data to the delta countourn.

The figure 3D_basement.htm will show only the basement surface.

In the figure 3D_lithosomes_LRD.htm we show the lithosomes and the basement surface.